Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Text binarization in color documents

Identifieur interne : 001000 ( Main/Exploration ); précédent : 000F99; suivant : 001001

Text binarization in color documents

Auteurs : Efthimios Badekas [Grèce] ; Nikos Nikolaou [Grèce] ; Nikos Papamarkos [Grèce]

Source :

RBID : ISTEX:259ED5DFC985771834D15F152D7C097A05F34BB5

English descriptors

Abstract

This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray‐scale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006

Url:
DOI: 10.1002/ima.20092


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Text binarization in color documents</title>
<author>
<name sortKey="Badekas, Efthimios" sort="Badekas, Efthimios" uniqKey="Badekas E" first="Efthimios" last="Badekas">Efthimios Badekas</name>
</author>
<author>
<name sortKey="Nikolaou, Nikos" sort="Nikolaou, Nikos" uniqKey="Nikolaou N" first="Nikos" last="Nikolaou">Nikos Nikolaou</name>
</author>
<author>
<name sortKey="Papamarkos, Nikos" sort="Papamarkos, Nikos" uniqKey="Papamarkos N" first="Nikos" last="Papamarkos">Nikos Papamarkos</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:259ED5DFC985771834D15F152D7C097A05F34BB5</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1002/ima.20092</idno>
<idno type="url">https://api.istex.fr/document/259ED5DFC985771834D15F152D7C097A05F34BB5/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000224</idno>
<idno type="wicri:Area/Istex/Curation">000221</idno>
<idno type="wicri:Area/Istex/Checkpoint">000973</idno>
<idno type="wicri:doubleKey">0899-9457:2006:Badekas E:text:binarization:in</idno>
<idno type="wicri:Area/Main/Merge">001017</idno>
<idno type="wicri:Area/Main/Curation">001000</idno>
<idno type="wicri:Area/Main/Exploration">001000</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Text binarization in color documents</title>
<author>
<name sortKey="Badekas, Efthimios" sort="Badekas, Efthimios" uniqKey="Badekas E" first="Efthimios" last="Badekas">Efthimios Badekas</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Grèce</country>
<wicri:regionArea>Department of Electrical and Computer Engineering, Image Processing and Multimedia Laboratory, Democritus University of Thrace, 67100 Xanthi</wicri:regionArea>
<wicri:noRegion>67100 Xanthi</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Nikolaou, Nikos" sort="Nikolaou, Nikos" uniqKey="Nikolaou N" first="Nikos" last="Nikolaou">Nikos Nikolaou</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Grèce</country>
<wicri:regionArea>Department of Electrical and Computer Engineering, Image Processing and Multimedia Laboratory, Democritus University of Thrace, 67100 Xanthi</wicri:regionArea>
<wicri:noRegion>67100 Xanthi</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Papamarkos, Nikos" sort="Papamarkos, Nikos" uniqKey="Papamarkos N" first="Nikos" last="Papamarkos">Nikos Papamarkos</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Grèce</country>
<wicri:regionArea>Department of Electrical and Computer Engineering, Image Processing and Multimedia Laboratory, Democritus University of Thrace, 67100 Xanthi</wicri:regionArea>
<wicri:noRegion>67100 Xanthi</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="ISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2006">2006</date>
<biblScope unit="volume">16</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="262">262</biblScope>
<biblScope unit="page" to="274">274</biblScope>
</imprint>
<idno type="ISSN">0899-9457</idno>
</series>
<idno type="istex">259ED5DFC985771834D15F152D7C097A05F34BB5</idno>
<idno type="DOI">10.1002/ima.20092</idno>
<idno type="ArticleID">IMA20092</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0899-9457</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>binarization</term>
<term>color quantization</term>
<term>document processing</term>
<term>segmentation</term>
<term>text localization</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray‐scale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Grèce</li>
</country>
</list>
<tree>
<country name="Grèce">
<noRegion>
<name sortKey="Badekas, Efthimios" sort="Badekas, Efthimios" uniqKey="Badekas E" first="Efthimios" last="Badekas">Efthimios Badekas</name>
</noRegion>
<name sortKey="Nikolaou, Nikos" sort="Nikolaou, Nikos" uniqKey="Nikolaou N" first="Nikos" last="Nikolaou">Nikos Nikolaou</name>
<name sortKey="Papamarkos, Nikos" sort="Papamarkos, Nikos" uniqKey="Papamarkos N" first="Nikos" last="Papamarkos">Nikos Papamarkos</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001000 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001000 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:259ED5DFC985771834D15F152D7C097A05F34BB5
   |texte=   Text binarization in color documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024